R Beginners Course

Introduction to R - Part 2

Objective and contents

Objective and contents

Objective: to generate the basic knowledge for importing, modifying, writing and plotting data as well as writing loops using the R Project for statistical computing.

Contents:

  • Importing and exporting data
  • Basic plots with R
  • Relational operators
  • Iterative processes

Importing and exporting data

Importing data

To import data to R we can use the read.table function, which returns a data frame object.

read.table(file, header = FALSE, sep = "",
           dec = ".", row.names, col.names, na.strings = "NA",
           skip = 0, stringsAsFactors = FALSE, ...)

file: the name of the file which the data are going to be read from.

header: a logical value indicating whether the file contains the name of the variables as its first line.

sep: the field separator character. Values on each line of the file are separated by this character.

Importing data

read.table(file, header = FALSE, sep = "",
           dec = ".", row.names, col.names, na.strings = "NA",
           skip = 0, stringsAsFactors = FALSE, ...)

dec: the character used in the file for decimal points.

na.strings: a character vector of strings which are to be interpreted as NA values.

stringsAsFactors: [Logical] should character vectors be converted to factors?

Remember, if we don’t add one of these inputs, the default setting will be used.

Importing data

Let’s create a table in Excel

Importing data

Save the table as a csv file:

Importing data

We use the path of the file in the read.table function.

data.path <- "C:/Users/Example/Data.csv"
p_eta_table <- read.table(data.path, header = TRUE, sep = ",")
print(p_eta_table)
##   Year Precipitation Evapotranspiration
## 1 2010          1250               1050
## 2 2011          1952               1748
## 3 2012          1328               1047
## 4 2013          1459               1284
## 5 2014          1642               1421
## 6 2015          1743               1484
## 7 2016          1238                998
## 8 2017          1432               1259

Importing data

We could also use the read.csv function:

p_eta_table <- read.csv(data.path)
print(p_eta_table)
##   Year Precipitation Evapotranspiration
## 1 2010          1250               1050
## 2 2011          1952               1748
## 3 2012          1328               1047
## 4 2013          1459               1284
## 5 2014          1642               1421
## 6 2015          1743               1484
## 7 2016          1238                998
## 8 2017          1432               1259

Read Table, CSV and CSV2 - Default Settings

The read.table, read.csv and read.csv2 functions, can be used to import data. Note that the read.csv and read.csv2 functions come from read.table but have different default settings.

             read.table     header = FALSE, sep = " "

             read.csv       header = TRUE, sep = ","

             read.csv2      header = TRUE, sep = ";"

Working with data frames

dim: returns the number of rows and columns (i.e. the dimensions) of the object.

dim(p_eta_table)
## [1] 8 3

Now we can add a new column to the existing data frame. As an example, we will calculate the values of precipitation minus evapotranspiration:

p_eta_table$P_minus_ETa <- p_eta_table$Precipitation - p_eta_table$Evapotranspiration

Working with data frames

print(p_eta_table)
##   Year Precipitation Evapotranspiration P_minus_ETa
## 1 2010          1250               1050         200
## 2 2011          1952               1748         204
## 3 2012          1328               1047         281
## 4 2013          1459               1284         175
## 5 2014          1642               1421         221
## 6 2015          1743               1484         259
## 7 2016          1238                998         240
## 8 2017          1432               1259         173

Working with data frames

Accessing data is the same for matrices and data frames. For example, to extract the value located in the second row and fourth column:

p_eta_table[2, 4]
## [1] 204

Or:

p_eta_table$P_minus_ETa[2]
## [1] 204

Working with data frames

Also we can subset the first three values of the third row.

p_eta_table[3, 1:3]
##   Year Precipitation Evapotranspiration
## 3 2012          1328               1047

Or make a subset of the data frame.

p_eta_table[c(7, 2, 4), ]
##   Year Precipitation Evapotranspiration P_minus_ETa
## 7 2016          1238                998         240
## 2 2011          1952               1748         204
## 4 2013          1459               1284         175

Export data

For exporting data, we can use the write.table function, which writes a data frame to a file or connection.

write.table(x, file = "", append = FALSE, quote = TRUE, sep = " ",
            na = "NA", dec = ".", row.names = TRUE, ...)

x: the object to be written. Preferably a matrix or data frame.

file: the file path where we want to save the file (including file name and extension).

append: [Logical] only relevant if the file is a character string. If TRUE, the output is appended to the file. If FALSE, any existing file of the name is destroyed.

Export data

write.table(x, file = "", append = FALSE, quote = TRUE, sep = " ",
            na = "NA", dec = ".", row.names = TRUE, ...)

quote: [Logical] if TRUE, any character or factor columns will be surrounded by double quotes.

sep: the field separator string. Values within each row of x are separated by this string.

na: the string to use for missing values in the data.

Export data

write.table(x, file = "", append = FALSE, quote = TRUE, sep = " ",
            na = "NA", dec = ".", row.names = TRUE, ...)

dec: the string to use for decimal points in numeric or complex columns: must be a single character.

row.names and col.names: [Logical] indicating whether the names are to be written along with x, or a character vector of names to be written.

Export data

Now, we will save the table that we have edited.

write.table(p_eta_table, paste0(dirname(data.path), "/table_modified.csv"),
row.names = FALSE, sep = ",")

Now, you can go to the corresponding folder and check whether the file has been written.

Export data

Additionally, we can save a single R object to a connection and later on restore it by using the saveRDS and writeRDS functions.

saveRDS(p_eta_table, paste0(dirname(data.path), "/table_modified.rds"))
x <- readRDS(p_eta_table, paste0(dirname(data.path), "/table_modified.rds"))

Joining tables

We can use the bind function to combine the rows from two separate tables to form one table. For this example, we will load tables with the most populous cities from both Spain and Italy.

spain.pop <- read.csv("...add file path…/Cities/Spain.csv", header = TRUE)
italy.pop <- read.csv("...add file path…/Cities/Italy.csv", header = TRUE)
##                          City Population
## 1                      Madrid    3334730
## 2                   Barcelona    1664182
## 3                 Val\xe8ncia     800215
## 4                     Sevilla     691395
## 5                    Zaragoza     681877
## 6                   M\xe1laga     578460
## 7                      Murcia     459403
## 8                       Palma     422587
## 9  Las Palmas de Gran Canaria     381223
## 10                     Bilbao     350184

Joining tables

First, let’s add information about the country to the table, so that the combined table will include the country for each entry.

spain.pop$Country <- rep("Spain", times = nrow(spain.pop))
italy.pop$Country <- rep("Italy", times = nrow(italy.pop))
print(spain.pop)
##                          City Population Country
## 1                      Madrid    3334730   Spain
## 2                   Barcelona    1664182   Spain
## 3                 Val\xe8ncia     800215   Spain
## 4                     Sevilla     691395   Spain
## 5                    Zaragoza     681877   Spain
## 6                   M\xe1laga     578460   Spain
## 7                      Murcia     459403   Spain
## 8                       Palma     422587   Spain
## 9  Las Palmas de Gran Canaria     381223   Spain
## 10                     Bilbao     350184   Spain
print(italy.pop)
##      City Population Country
## 1    Roma    2808293   Italy
## 2  Milano    1406242   Italy
## 3  Napoli     948850   Italy
## 4  Torino     857910   Italy
## 5 Palermo     647422   Italy
## 6  Genova     565752   Italy
## 7 Bologna     395416   Italy
## 8 Firenze     366927   Italy
## 9    Bari     315284   Italy

Joining tables

Now we can use the rbind function to join the tables.

pop.table <- rbind(spain.pop, italy.pop)

Similarly, the cbind function will combine objects by columns.

Basic plots with R

Basic plots with R

The plot function is a generic function. This means that the type of plot produced is dependent on the type or class of the first argument.

plot(x, y, type, main, sub, xlab, ylab)

x: the coordinates of points in the plot (x-axis) y: the coordinates of points in the plot (y-axis) main: a title for the plot sub: a subtitle for the plot xlab and ylab: titles for the x and y axes

Note: y can be omitted if x has the appropriate structure (e.g., for a raster file)

Basic plots with R

type: what type of plot should be drawn. Possible types are:

  1. “p” for points
  2. “b” for both (points and lines) b) “l” for lines
  3. “c” for the lines part alone of “b”
  4. “o” for overplotted
  5. “s” for stair steps
  6. “n” for no plotting f) “h” for histogram like
  7. “S” for other steps

Basic plots with R

To show the first set of plotting examples, we will read a csv file with World Bank data for annual mean cereal crop yield for both the world and for Germany.

crop.yield <- read.csv("...add file path.../Data/Cereal_Yield.csv")
print(crop.yield)

When we have a lot of data entries in stored in an object, we can use the head and tail functions to look at the first six and last six entries, respectively.

Note: the values presented are in kg/ha.

Basic plots with R

head(crop.yield)
##   Year Global_Cereal_Yield German_Cereal_Yield
## 1 1961            1431.537              2417.4
## 2 1962            1523.116              2962.2
## 3 1963            1589.004              2925.2
## 4 1964            1589.813              3120.8
## 5 1965            1639.062              2852.2
## 6 1966            1680.538              2878.0
tail(crop.yield)
##    Year Global_Cereal_Yield German_Cereal_Yield
## 52 2012            3619.562              6964.9
## 53 2013            3824.374              7318.0
## 54 2014            3892.360              8050.3
## 55 2015            3938.770              7497.8
## 56 2016            3967.029              7182.1
## 57 2017            4074.176              7269.9

Line charts

Let’s start with a line chart for the global values.

plot(crop.yield$Year, crop.yield$Global_Cereal_Yield, type = "l")

Note: we will get the same result by selecting the column numbers instead of the names: i.e., plot(crop.yield[,1], crop.yield[,2], type = “l”).

Line charts

Now, we will redo the plot in a more customised manner (chart title, axes titles, user defined minimum and maximum axis values).

plot(crop.yield$Year, crop.yield$Global_Cereal_Yield, type = "l",
main = "Mean Cereal Yield", xlab = "Year",
ylab = "Mean yield (kg/ha)", ylim = c(0, 10000))

Line charts

The lines function is used to add another line to an existing plot. The legend function is used to add a legend to the plot.

lines(crop.yield$Year, crop.yield$German_Cereal_Yield, col = "red")
legend("topleft", c("Global", "Germany"), lty = 1, col = c("black", "red"))

Adding Points to a Graph

The points function is used to add points to an existing plot. pch selects to the plotting character and col the plotting colour.

points(x = c(1990, 2000, 2010), y = c(3000, 4000, 6000),
        pch = 20, col = c("red", "blue", "black"))

Scatter Plots

Example: we can also plot the information using a scatter plot.

plot(crop.yield$Year, crop.yield$Global_Cereal_Yield, type = "p")

Note: the default plot option is with points. Therefore, if we do not specify the type, we will have the same result.

Histograms

A histogram is a graphical representation of the distribution of numerical data. - It is an estimate of the probability distribution of a continuous variable.

Example: the daily temperature values (in °C) for April in a particular city are:

Histograms

temp <- c(13, 14, 14, 15, 17, 14, 12, 14, 16, 
          25, 21, 19, 20, 20, 23, 23, 23, 21, 
          16, 17, 14, 14, 15, 16, 23, 16, 17, 
          27, 24, 21)

Histograms

The function hist will generate a histogram.

hist(temp)

Histograms

Again, we can customise the histogram.

hist(temp, main = "Histogram of temperature in April", xlab = "Temp [°C]",
border = "blue", col = "cyan", breaks = 8)

Histograms

Remember to think carefully about how you plot and analyse your data - Looking at the previous example, if we plot the data with 5 breaks and with 8 breaks:

Boxplots

The function boxplot produces box-and-whisker plot(s) of (grouped) values.

boxplot(temp)

Boxplots

Now, we will load daily values for maximum temperature in a pilot city.

daily.temp <- read.csv("C:/Users/Data/Daily_Temp.csv")
print(daily.temp)
##     Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov  Dec
## 1  15.0 17.8 20.6 32.2 27.2 22.8 32.2 31.1 30.0 30.6 24.4 17.2
## 2  15.6 16.7 18.3 22.8 29.4 22.8 32.2 33.9 29.4 31.1 28.9 17.2
## 3  12.8 16.1 20.0 21.1 35.0 26.1 32.2 32.8 28.3 27.2 25.6 16.7
## 4  17.8 18.9 23.3 27.2 32.8 23.3 31.7 32.8 31.7 19.4 15.6 17.2
## 5  21.7 21.7 27.8 24.4 30.0 25.6 33.9 30.0 27.8 17.2 17.8 17.8
## 6  21.1 25.0 20.0 19.4 25.6 22.8 34.4 29.4 36.7 18.9 14.4 18.3
## 7  16.1 26.7 15.0 16.7 22.2 23.3 32.8 29.4 37.8 22.8 17.8 19.4
## 8  15.0 17.2 21.1 14.4 20.0 22.2 31.1 29.4 37.8 27.2 21.1 19.4
## 9  15.0 21.1 28.9 14.4 17.2 21.1 30.6 27.2 27.8 28.9 25.6 21.1
## 10 17.8 20.6 26.7 20.0 21.7 21.1 29.4 26.7 23.9 28.3 23.9 21.7
## 11 21.1 23.3 23.9 22.2 22.8 23.3 27.8 26.7 26.7 28.3 25.0 15.6
## 12 22.8 26.7 19.4 21.7 25.6 22.8 25.0 26.7 29.4 38.3 17.8 11.7
## 13 23.9 23.9 21.1 18.9 24.4 25.6 25.0 25.6 27.8 37.2 22.8 12.2
## 14 27.2 19.4 25.0 23.3 20.0 27.2 23.3 28.3 25.6 31.1 21.7 16.1
## 15 26.7 19.4 25.6 30.0 17.8 27.2 24.4 28.9 25.6 27.2 20.6 15.0
## 16 27.2 16.1 24.4 33.3 18.9 25.6 25.0 30.0 22.2 26.1 21.7 17.8
## 17 28.9 16.1 22.8 27.8 17.2 23.9 26.7 31.1 24.4 29.4 21.7 19.4
## 18 28.9 18.9 20.6 18.3 17.2 25.0 28.9 30.0 27.2 26.7 15.6 14.4
## 19 20.6 14.4 17.2 20.6 20.6 24.4 30.6 27.2 27.2 22.8 16.1 14.4
## 20 24.4 13.9 15.6 20.0 22.2 27.8 31.1 27.8 27.8 23.9 16.1 17.8
## 21 25.0 16.1 15.6 20.0 25.6 32.2 30.6 26.7 28.3 22.8 16.1 18.3
## 22 22.2 16.1 16.7 20.6 21.1 30.6 27.8 29.4 30.6 28.9 19.4 18.9
## 23 24.4 15.6 19.4 20.6 20.0 29.4 28.9 31.1 28.3 28.9 17.8 20.0
## 24 23.9 15.6 16.7 20.0 22.8 27.8 28.3 31.7 25.0 21.7 16.7 22.8
## 25 23.3 15.0 16.7 22.2 26.1 29.4 33.3 33.3 23.3 20.0 18.9 23.3
## 26 25.0 11.7 15.6 22.8 23.3 29.4 30.0 35.0 26.7 21.1 27.8 23.3
## 27 25.0 13.9 18.9 28.3 26.7 31.7 26.1 36.1 30.0 24.4 28.9 21.1
## 28 24.4 18.9 18.3 28.3 22.2 28.3 27.8 33.3 27.8 27.2 27.2 26.1
## 29 20.6   NA 23.3 22.8 17.8 26.7 34.4 35.0 27.8 28.3 26.7 25.6
## 30 15.0   NA 29.4 26.7 26.1 28.9 36.7 32.2 31.1 31.1 23.3 22.8
## 31 17.2   NA 32.2   NA 26.1   NA 29.4 30.0   NA 28.9   NA 25.0

Boxplots

To generate a boxplot with the data:

boxplot(daily.temp, main = "Daily Max. Temp of Pilot City in 2011",
col = "cyan", ylab = "Max temp (degrees C)")

Bar charts

The barplot function creates a bar chart with vertical or horizontal bars.

barplot(crop.yield$Global_Cereal_Yield)

Bar charts

Let’s customise the bar chart.

barplot(crop.yield$Global_Cereal_Yield, main = "Global Cereal Yield",
sub = "Data from World Bank", xlab = " Year", ylab = "Yield (kg/ha)", names.arg = crop.yield$Year)

Bar charts

Some other options to change how your data is presented in a bar chart:

  • col to add colour to the plot

  • horiz to create a horizontal bar chart (set to TRUE)

x <- c(8, 5, 10)
barplot(x, main = "Example of Horizontal Bar Chart",
names.arg = c("A", "B", "C"), horiz = TRUE,
ylab = "number", col = c("red", "blue", "yellow"))

Bar charts

Pie charts

The function pie draws a pie chart.

slices <- c(10, 12, 4, 16, 8)
country <- c("Canada", "UK", "Australia", "Germany", "France")
pie(slices, labels = country)

Pie charts

More personalised:

pie(slices, labels = country, main = "Pie chart of countries",
col = rainbow(length(slices)))

Relational operators

Relational operators

R uses the following relational operators:

<     lower than
>     greater than
<=    lower than or equal to
>=    greater than or equal to
==    equal to
!=    not equal to

For example:

5 > 4
## [1] TRUE
2 + 2 == 5
## [1] FALSE

Iterative processes

Iterative processes

Loops are very important because they allow us to do the following:

  • Only run a certain code if a condition is met (if)

  • Run the same process a specified number of times (for)

    • A common example is to run the same code for each time step
  • Continue running a process until a condition is met (while)

Iterative processes

If statements

R is able to perform conditional executions of the form:

if (conditional statement)  process 1 else process 2

The conditional statement must evaluate a single logical value, the process 1 will be run if the condition is met, while the process 2 will run when the condition is not met.

If statements

For example:

if(5 > 4) print("Yes") else print("No")
## [1] "Yes"

Or:

if(5 > 4)
  print("Yes") else
    print("No")
## [1] "Yes"

If statements

If the processes contained in the condition require multiple lines, they should be written inside of curly brackets.

if(5 > 4){
  x <- 5
  y <- x * 10
  print(y)
} else {
  x <- 2
  y <- x * 5 
  print(y)
  
}
## [1] 50

If statements

Note that the else statement is not always required:

x <- 5
if(x > 4){
  y <- x *10
  print(y)
}
## [1] 50

If statements

An example of an else clause within an if loop is as follows:

input.value <- 32
if(input.value > 0){
  result <- log(input.value)
  paste("The log of", input.value, "is equal to", result)
} else {
  paste("Input value is negative, therefore no log value can be calculated")
}
## [1] "The log of 32 is equal to 3.46573590279973"

If statements

Sometimes, we want to run a loop if any one of a set of conditions is met.

day <- "Monday"

if(day == "Saturday" | day == "Sunday"){
  response <- paste("No alarm set for the weekend")
} else {
  alarm.time <- "7am"
  response <- paste("My alarm will be set for", alarm.time)
}

print(response)
## [1] "My alarm will be set for 7am"

For loops

The for loop has the form:

for(variable in vector) process

The variable will be iterated during the for loop taking each one of the values defined in the vector. The process will be applied in each iteration. For example:

for(i in 1:5) print(i)
## [1] 1
## [1] 2
## [1] 3
## [1] 4
## [1] 5

For loops

Similarly to the if statement, if the process has more than one line of code, it should be surrounded by curly brackets.

for(i in 1:5){
  x <- i^2
  y <- x * 5
  print(y)
}
## [1] 5
## [1] 20
## [1] 45
## [1] 80
## [1] 125

For loops

If you write a loop within a loop, we refer to this as a nested loop,

for(i in 1:10){
  for (j in 1:5){
    a <- i + j
    cat(i, "plus", j, "is equal to", a, "\n")
}
  cat("------------------------ \n")
}

For loops

We can use a for loop in a spatial time series analysis.

While loops

The while loop has the form:

while(condition) process

The condition will be evaluated, if it is true, the process will be run. When the condition is not longer met, the while loop will stop.

While loops

For example:

delta <- 0.75
x     <- 1
while(delta < x){
  x <- x - 0.05
  print(x)
}
## [1] 0.95
## [1] 0.9
## [1] 0.85
## [1] 0.8
## [1] 0.75